DNA cytosine methylation is the addition of a methyl group to a cytosine in the DNA. It impacts transcription and therefore plays a major role in several vital processes. In mammals, DNA cytosine methylation predominantly occurs in CG sequence contexts. In plants, in addition to the CG context, the CHG and CHH contexts are common as well.
Various tools have been introduced to facilitate the analysis of DNA cytosine methylation data. Usually, they focus on a small part of the workflow, which still leaves users with a considerable amount of work to evaluate appropriate tools, transform intermediate output, and finally generate publication-ready figures. Additionally, many tools are limited in terms of the input data, only providing support for certain species and/or specific library preparation methods.
Here we introduce Methylator, a user-friendly tool for a full DNA cytosine methylation analysis, with an easy-to-use interface, facilitated reproducibility and interactive visualizations of results.
Although technical duplicate reads can arise from different sources, most deduplication tools focus on PCR duplicates. Clumpify can deal with the different types of duplication in the sequencing data, such as optical duplicates. As the abundance of the types of duplicates in the data depends on the sequencing technology used, Methylator adapts the duplication removal accordingly to the user input.
Bisulfite treatment of DNA results in a decreased sequence complexity, which deteriorates mapping efficiency. Overall, this causes loss of a big proportion of the sequencing data for analysis.
The Dirty Harry method offers an improvement to the mapping rate by remapping the unaligned reads locally. Through that, this method increases the mapping efficiency and retains a considerable amount of cytosine sites, which would otherwise be lost.
For each type of analysis, several outputs are generated and visualized in a single interactive shiny app. Publication-ready figures are created using colourblind-friendly palettes. Each plot can be customized and downloaded individually by the user.
| Methylator | methylseq | ARPEGGIO | MethylStar | MethylC-analyzer | |
|---|---|---|---|---|---|
| Platform independent |
✓ |
✓ |
✕ |
✓ |
✕ |
| Self-contained |
✓ |
✓ |
✓ |
✓ |
✕ |
| Interface | GUI (SUSHI, Galaxy, Shiny), CLI | CLI | CLI | CLI | CLI, GUI |
| Input Data | WGBS, RRBS, PBAT, TAPS, ABBS | WGBS, RRBS | WGBS | WGBS, PBAT, single-cell | WGBS, RRBS |
| Supported genomes | Mammals, Plants (incl. Polyploids) | General | Polyploids | Mammals, Plants | Mammals, Plants |
| Quality control |
✓ |
✓ |
✓ |
✓ |
✕ |
| Alignment | Bismark (incl. dirty harry), Arioc (GPU-based), EAGLE-RC | Bismark, bwa-meth | Bismark, EAGLE-RC | Bismark |
✕ |
| Deduplication | Clumpify | Bismark, Picard | Bismark | Bismark |
✕ |
| Methylation context | All | All | All | All | All |
| Exploratory data analysis | PCA, heatmaps, methylation summaries |
✕ |
✕ |
✕ |
PCA, heatmaps, methylation summaries |
| Differential methylation analysis | DMRs, DMLs |
✕ |
✓ |
✕ |
DMRs and DMGs |
| Copy number variation analysis | CNVkit |
✕ |
✕ |
✕ |
✕ |
| Functional analysis | GO, KEGG, Reactome, user-defined |
✕ |
✕ |
✕ |
✕ |
| Motif analysis | Homer |
✕ |
✕ |
✕ |
✕ |
| Visualization |
✓ |
✕ |
✕ |
✕ |
✓ |
| Year published | In development | 2020 | 2021 | 2020 | 2023 |
Table 1: Comparison with features of widely used tools.
GUI = graphical user interface, CLI = command line interface
DMR/DML/DMG = Differentially Methylated Regions/Loci/Genes
GO = Gene Ontology